School-age outcomes among IVF-conceived children: A population-wide cohort study

Background In vitro fertilisation (IVF) is a common mode of conception. Understanding the long-term implications for these children is important. The aim of this study was to determine the causal effect of IVF conception on primary school-age childhood developmental and educational outcomes, compared with outcomes following spontaneous conception. Methods and findings Causal inference methods were used to analyse observational data in a way that emulates a target randomised clinical trial. The study cohort comprised statewide linked maternal and childhood administrative data. Participants included singleton infants conceived spontaneously or via IVF, born in Victoria, Australia between 2005 and 2014 and who had school-age developmental and educational outcomes assessed. The exposure examined was conception via IVF, with spontaneous conception the control condition. Two outcome measures were assessed. The first, childhood developmental vulnerability at school entry (age 4 to 6), was assessed using the Australian Early Developmental Census (AEDC) (n = 173,200) and defined as scoring <10th percentile in ≥2/5 developmental domains (physical health and wellbeing, social competence, emotional maturity, language and cognitive skills, communication skills, and general knowledge). The second, educational outcome at age 7 to 9, was assessed using National Assessment Program–Literacy and Numeracy (NAPLAN) data (n = 342,311) and defined by overall z-score across 5 domains (grammar and punctuation, reading, writing, spelling, and numeracy). Inverse probability weighting with regression adjustment was used to estimate population average causal effects. The study included 412,713 children across the 2 outcome cohorts. Linked records were available for 4,697 IVF-conceived cases and 168,503 controls for AEDC, and 8,976 cases and 333,335 controls for NAPLAN. There was no causal effect of IVF-conception on the risk of developmental vulnerability at school-entry compared with spontaneously conceived children (AEDC metrics), with an adjusted risk difference of −0.3% (95% CI −3.7% to 3.1%) and an adjusted risk ratio of 0.97 (95% CI 0.77 to 1.25). At age 7 to 9 years, there was no causal effect of IVF-conception on the NAPLAN overall z-score, with an adjusted mean difference of 0.030 (95% CI −0.018 to 0.077) between IVF- and spontaneously conceived children. The models were adjusted for sex at birth, age at assessment, language background other than English, socioeconomic status, maternal age, parity, and education. Study limitations included the use of observational data, the potential for unmeasured confounding, the presence of missing data, and the necessary restriction of the cohort to children attending school. Conclusions In this analysis, under the given causal assumptions, the school-age developmental and educational outcomes for children conceived by IVF are equivalent to those of spontaneously conceived children. These findings provide important reassurance for current and prospective parents and for clinicians.


Methods and findings
Causal inference methods were used to analyse observational data in a way that emulates a target randomised clinical trial. The study cohort comprised statewide linked maternal and childhood administrative data. Participants included singleton infants conceived spontaneously or via IVF, born in Victoria, Australia between 2005 and 2014 and who had school-age developmental and educational outcomes assessed. The exposure examined was conception via IVF, with spontaneous conception the control condition. Two outcome measures were assessed. The first, childhood developmental vulnerability at school entry (age 4 to 6), was assessed using the Australian Early Developmental Census (AEDC) (n = 173,200) and defined as scoring <10th percentile in �2/5 developmental domains (physical health and wellbeing, social competence, emotional maturity, language and cognitive skills, communication skills, and general knowledge). The second, educational outcome at age 7 to 9, was assessed using National Assessment Program-Literacy and Numeracy (NAPLAN) data (n = 342,311) and defined by overall z-score across 5 domains (grammar and punctuation, reading, writing, spelling, and numeracy). Inverse probability weighting with regression adjustment was used to estimate population average causal effects. The study included 412,713 children across the 2 outcome cohorts. Linked records were available for 4,697 IVF-conceived cases and 168,503 controls for AEDC, and 8,976 cases and 333,335 controls for NAPLAN. There was no causal effect of IVF-conception on the risk of developmental vulnerability at school-entry compared with spontaneously conceived children (AEDC metrics), with an adjusted risk difference of −0.3% (95% CI −3.7% to 3.1%) and an adjusted risk ratio of 0.97 (95% CI 0.77 to 1.25). At age 7 to 9 years, there was no causal effect of IVF-conception on the NAPLAN overall z-score, with an adjusted mean difference of 0.030 (95% CI −0.018 to 0.077) between IVF-and spontaneously conceived children. The models were adjusted for sex at birth, age at assessment, language background other than English, socioeconomic status, maternal age, parity, and education. Study limitations included the use of observational data, the potential for unmeasured confounding, the presence of missing data, and the necessary restriction of the cohort to children attending school.

Conclusions
In this analysis, under the given causal assumptions, the school-age developmental and educational outcomes for children conceived by IVF are equivalent to those of spontaneously conceived children. These findings provide important reassurance for current and prospective parents and for clinicians.

Author summary
Why was this study done?
• More than 8 million children have been conceived globally with the assistance of in vitro fertilisation (IVF).
• Some studies suggest these children have an increased risk of congenital abnormalities, autism spectrum disorder, developmental delay, and intellectual disability.
• Educational and school-age developmental outcomes following IVF conception have not yet been adequately characterised.
What did the researchers do and find?
• Using statewide, linked population data from Victoria, Australia, we investigated the school-age developmental and educational outcomes for children born following IVFassisted conception.
• The study examined 2 separate assessments of school-age development and educational outcomes among 585,659 children, including 11,059 children who were conceived via IVF.

Introduction
In vitro fertilisation (IVF) is a common mode of conception worldwide [1]. Since the first successful IVF birth in 1978, more than 8 million babies have been born globally following IVF conception [2,3]. In Australia, it is now estimated that 1 in 20 babies are born following IVF conception [4,5]. As the number of children born following IVF conception continues to rise, a deeper understanding of the long-term implications for these children is important. It is well established that there are increased risks of maternal and perinatal complications following IVF conception [6][7][8]. Large cohort studies have suggested an increase in the frequency of congenital abnormalities, autism spectrum disorder, developmental delay, and intellectual disability in children conceived via IVF or intracytoplasmic sperm injection (ICSI) techniques [9][10][11][12][13]. However, reports detailing longer term outcomes after IVF beyond the neonatal period remain sparse.
Educational and cognitive outcomes following IVF conception have not yet been thoroughly investigated. Several small cohort studies [14][15][16][17] have reported conflicting results. One large population-based study suggested a small difference in school performance in favour of spontaneous conception [18]. Another population study recently concluded that school performance was not adversely affected by the process of IVF but, rather, the condition of subfertility [19].
Parents of IVF-and spontaneously conceived children possess inherently different health and sociodemographic characteristics [20,21]. Factors such as increased maternal age and higher education are known to be associated with both the use of fertility treatment and better early childhood outcomes [22][23][24]. It is thus critical that such factors are appropriately acknowledged when examining the association between mode of conception and childhood outcomes. Proper adjustment in any statistical analysis is required before any association can be given a causal interpretation.
Our study aimed to overcome some of the limitations of the analysis of observational (nonrandomised) data by using a causal inference approach that seeks to emulate the results of a randomised comparison in a clinical trial (Table 1) [25,26]. This analytical approach attempts to simulate a randomised trial by (1) requiring an a priori statistical analysis protocol; (2) addressing a causal question reflecting the effect of an intervention at a specific clinical decision point on a prespecified outcome; and (3) using inverse probability weighting via propensity score (PS) models to balance the outcome propensity differences between exposed and control populations, with the aim of producing exchangeable comparison groups [27]. This allowed us to estimate the population-average effect of mode of conception (IVF versus spontaneous conception) on childhood developmental and educational outcomes with a causal interpretation. Our study aims to estimate the total causal effect of IVF conception on schoolage childhood developmental and educational outcomes using a causal inference approach and employing the necessary assumptions.

Study design
This study is reported as per the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guideline (Checklist in S13 File).
Population. The study population included all singleton livebirths in Victoria between 2005 and 2014. Twins and higher order multiple births were excluded. Perinatal information Inclusion criteria: Same as target trial. Limitation: It is not possible to collect data on failed attempts, miscarriages, or stillbirths = Livebirth bias. Livebirth bias introduces potential selection bias, collider bias, and "depletion of susceptible cases" (see below). Exclusion criteria: Same as target trial, same exclusion criteria can be appliedimportant to ensure positivity is maintained between exposure groups.
(ii) Treatment strategies (A) Spontaneous ("in vivo") conception leading to a live birth (B) Conception aided by IVF leading to a live birth (A) Same as target trial (B) Same as target trial Positivity assumption: Every subject could potentially be included in either exposure group. This is both a design feature as well as one of statistical adjustment where the positivity assumption is addressed by dataset trimming to ensure overlapping of inverse probability weights.

(iii) Assignment procedures
Randomised at entry-decision to conceive vs. Modified intention to treat (ITT)-randomised at decision to conceive and included in trial after subsequent live birth.
Modified ITT-time commences from livebirth. DAG used to identify prespecified covariates to be included in estimator models for both confounding control and outcome adjustment. Estimator used a doubly robust inverse-probability-weighted regression adjustment model to achieve adherence to ignorability and positivity assumptions, such that observed groups can be considered balanced or exchangeable.
(iv) Follow-up period AEDC-school entry, 4-6 years of age NAPLAN-Grade 3, 7-9 years of age Some trial participants would be lost to follow-up.
Same as target trial.

Considerations and limitations:
Post-exposure and pre-selection: live birth bias Post-exposure and post-selection: follow-up divided into (1) postnatal loss (akin to loss to follow-up), and (2) unobserved confounding.
(1) Postnatal: This is distinct from livebirth bias inclusion loss to follow-up post "live birth" due to neonatal death, infant death, childhood death, nonschool attendance, missing outcome data.
(i) Dataset does not contain child death data (iia) Missing data due to disability to be analysed as sensitivity analyses (AEDC: special needs, NAPLAN: exempt) (see below under outcome) (iib) Dataset does not contain data on children who do not attend school, i.e., severe disability-limitation for discussion.
(Continued ) (1) Primary analysis-"special needs missing" coded as "vulnerable." Therefore, no missing outcome data or exposure data and imputation of missing covariate data only performed.
(2) Sensitivity analyses: (a) excluding special needs cases completely and (b) imputing their outcomes (biased); this data is MNAR and thus this is performed for illustration. NAPLAN Same NAPLAN outcome measure as target trial.
• Children commence school at difference ages at the start of calendar year and the assessment is not adjusted for age range (7-9), thus age of assessment will be included in model for standardisation. • The NAPLAN paper is different each year, thus year of assessment is included in model. Children exempt from sitting NAPLAN are, by definition, below the NAPLAN national minimum standard for each domain from which they have been excluded, thus "missing not at random." Methods for addressing NAPLAN "exempt" status: (1) Primary analysis-exempt coded as either the lowest possible test z-score (continuous) or below the national minimum standard (binary) NAPLAN domain outcomes. Therefore, no missing outcome data or exposure data and imputation of missing covariate data performed.
(2) Sensitivity analyses: (a) excluding exempt cases completely and (b) imputing their outcomes (biased); these data are MNAR and thus this was performed for illustration only. was collected from audited birth outcome data through the Victorian Perinatal Data Collection (VPDC) [28,29]. The 3 largest IVF units in Victoria provided maternal records from all cycles that resulted in a birth during the study period. Creation of linked maternal/child data pairs required matching of the VPDC data with birth records, which were obtained from the Victorian Births, Deaths and Marriage registry. Exposure. The exposure was conception via IVF compared with spontaneous conception. The term "IVF" is used collectively to include both conventional IVF, IVF with ICSI, and associated laboratory techniques. IVF cases were identified through the IVF database. Victorian births not identified in the IVF database were allocated to the control group. Pregnancies recorded as "IVF conception" in the VPDC but not identified within the IVF database were excluded, ensuring the control group did not contain any IVF conceptions. These cases likely represent overseas or interstate IVF conceptions, Victorian IVF conceptions not captured by our database, or failed linkages between the IVF database and state birth records.

(vi) Causal contrasts of interest and analysis plan
Main outcome measures. Childhood educational and developmental outcomes were assessed using 2 standardised, national assessments. The Australian Early Development Census (AEDC) [30] and The National Assessment Program-Literacy and Numeracy (NAPLAN) [31]. See Supporting information file (Methods in S1 File), for a detailed description of each measure.
Australian Early Developmental Census (AEDC). The AEDC assesses broad childhood functional development at school entry (age 4 to 6) across 5 domains: physical health and wellbeing, social competence, emotional maturity, language and cognitive skills (school-based) and communication skills, and general knowledge. The primary AEDC outcome for this study was a global measure, developmental vulnerability, defined as scoring <10th percentile in � 2 of the 5 developmental domains. The secondary outcomes included developmental vulnerability in each of the 5 domains.
The National Assessment Program-Literacy and Numeracy (NAPLAN). NAPLAN is a school-based psychometric assessment, assessing 5 educational domains: grammar and punctuation, reading, writing, spelling, and numeracy [32]. The study cohort's grade 3 NAPLAN (fourth year of primary school) results were investigated. For this study, an overall z-score was calculated and used as the primary outcome, with the individual domain z-scores examined as secondary outcomes. By a priori consensus, a mean z-score difference of 0.2 standard deviations was considered to be clinically relevant. Individual domain scores below the published national minimum standard (NMS) NAPLAN scores for each year and for each domain were analysed as secondary (binary) outcomes.
Covariates. Covariates to be considered for inclusion in the statistical analysis models were decided a priori by the authorship team whose expertise included epidemiology, perinatology, reproductive endocrinology, and education. These covariates included child's sex (as assigned at birth), child's age in years at assessment, language background other than English (LBOTE), maternal age (at birth of the child), parity and both maternal and paternal highest obtained level of education, and socioeconomic status [33]. Gestational age at birth, mode of delivery, and birthweight were considered to be mediators on the causal pathways of interest and therefore not adjusted for in this analysis. A directed acyclic graph (DAG) was created to describe the structure of the relationships between all variables and identify the adjustment variable set, in line with the methodology recommendations of Tennant and colleagues [34]. Our prespecified statistical analysis plan (SAP) and the DAG were agreed upon and signed off by all authors in May 2020, prior to the commencement of data analysis (Protocol in S2 File).
Linkage. Administrative record linkage techniques were employed to match cases with the exposure (conception via IVF) through to childhood outcome data. Data linkage was performed by the Centre for Victorian Data Linkage (CVDL), a third-party government-funded data linkage unit [35]. Probabilistic linkage was performed between the 5 individual databases -birth records, birthing outcomes, IVF records, AEDC, and NAPLAN. Post-linkage data were manually screened for false matches using secondary variables (e.g., residential postcode). False matches and duplicates were removed (Table A in S3 File outlines number and percentage of successful linkages).
Two separate, linked study populations were identified, children with a linked AEDC record and children with a linked NAPLAN record. These 2 cohorts were analysed and are reported separately. Some children were included in both cohorts.

Causal assumptions
The ATE (average treatment effect) estimand used in this study is based upon the Potential Outcomes Framework. If a set of assumptions is met, then causal interpretation can be made. The causal assumptions are counterfactual consistency, ignorability (conditional exchangeability), and positivity. Counterfactual consistency means that the definition of exposure is consistent for all individuals. Ignorability states that treatment assignment can be considered random after controlling for, conditioning on, a set of covariates [36]. By identifying confounding variables, and importantly, the structure of the relationships between variables, via a DAG (S1 Fig) and by performing appropriate statistical modelling to balance the population (for example, inverse probability weighting), observed populations can be considered exchangeable or "unconfounded." Exchangeability requires that there are no important unmeasured confounders; this assertion is untestable. The positivity assumption means that for all observations, the conditional probability of being exposed (receiving treatment/no treatment) is greater than zero. This is likely violated if overlap of the control and exposed populations is poor [26].
To best emulate a target trial, it must be possible for all participants to potentially receive both treatments. To ensure the assumptions underlying our causal approach were as robust as possible, we considered our observational data in direct comparison with the conditions of a target trial (Table 1)

Handling of missing data
The proportions of missing data are described in Table 2. Data were missing for outcomes and covariates; there were no missing exposure data. Missing covariates and outcomes that were considered to either be missing completely at random or missing at random were imputed. Children identified as having special needs are not allocated an AEDC domain category and thus their outcome data is missing. Likewise, children who attend school but have a disability that precludes them from being able to appropriately participate in the NAPLAN are exempt from sitting the test; by definition, these children are below the NAPLAN national minimum standard for each domain from which they have been excluded. Outcome data for these children for both AEDC domain categories and NAPLAN z-scores was considered missing not at random. In the analysis of all primary and secondary AEDC outcomes, children with special needs have been included and assumed to be "developmentally vulnerable." In the analysis of NAPLAN outcomes, "exempt" children have also been included and allocated either the lowest possible test z-score or deemed to be below the national minimum standard for binary NAPLAN domain outcomes.
All covariates in the analysis model that had missing data were imputed, even if missing was very low. For the AEDC analysis, imputed covariates included parity, age at assessment, maternal education, Socio-Economic Indexes for Areas (SEIFA), and outcome score. For the NAPLAN analysis, imputed covariates included parity, age at assessment, maternal education,   paternal education, SEIFA, and outcome score. Maternal body mass index (BMI) was excluded from imputation and analysis because the missingness was too high (>50%). For AEDC imputation models, second parent education level (42% missing) was excluded due to non-convergence when included in the imputation model.
Multiple imputation of missing data was performed under a fully conditional specification using a predictive mean model for continuous and unordered categorical covariates and a logistic model for binary covariates, with standard errors (SEs) accounting for maternal clustering (Methods, Table 2 and Figs A-C in S5 File). The model contained outcome, exposure, model covariates, and auxiliary variables (AEDC: remote locality, Aboriginal and Torres Strait Islander (ATSI) status, and maternal country of origin; NAPLAN: ATSI status and maternal country of origin) along with interaction terms (exposure-parity, exposure-maternal age, exposure-age at assessment, gender-age at testing) and 1 higher order term (test age 2 ). At 20 imputations, the Monte Carlo errors were less than 10% of corresponding SE for all covariates. Each imputation model was subjected to the recommended diagnostic tests [37].

Statistical analysis
Descriptive statistics were calculated and are reported for each cohort by IVF exposure status, according to type and distribution of data. Treatment effect size modelling. All multivariate models were adjusted for the listed covariates identified in the prespecified SAP, except for (1) maternal BMI; and (2) second parent education level, for AEDC outcome models only.
For each of the imputed datasets, the predicted probability of exposure or PS and associated inverse probability weight (IPW = 1/PS) were estimated using a logistic regression model, conditional on all analysis model covariates [38]. These weights were then stabilised by including as a factor in the numerator the proportion of each treatment group within the population, i.e., the prevalence of IVF and spontaneous conception [39]. Diagnostic tests performed after planned treatment effect modelling (Figs A-D in S6 File) demonstrated poor overlap of exposed and non-exposed cohorts. We therefore restricted our analysis to the IVF population whose weights overlapped with the control group to ensure that the overlap ("positivity") assumption was not violated. This reduced the IVF cases to 31.6% of the original AEDC cohort and 22.3% of the NAPLAN cohort. For each covariate, the standardised mean difference between the exposure arms was calculated to assess if balance between weighted pseudo-populations was achieved (Figs A and B in S8 File).
For each imputed dataset, a doubly robust inverse-probability-weighted regression adjustment (IPWRA) model [38,39] was then used to estimate the respective potential outcome means (POM) followed by (1) the risk difference (RD) and relative risk (RR) for binary outcomes (AEDC and NAPLAN); and (2) mean differences (MD) for continuous outcomes (NAPLAN z-score).
Finally, estimates for each imputed dataset were pooled to provide overall ATE with associated 95% confidence limits using Rubin's method.
Provided the assumptions outlined above are satisfied, the estimates generated from these analyses can be interpreted as the population average causal effect, that is, the mean effect on the outcome if the treatment was applied to the entire population and contrasted with the outcome if the entire population received the control condition.
Clustering. Clustering of data within mothers due to more than 1 singleton birth during the study period was accounted for in the imputation models, the calculation of inverse probability weights and estimation of the treatment effect by using robust SEs.
Sensitivity analyses. Sensitivity analyses were also performed to address identified sources of potential bias. For both AEDC (special needs status) and NAPLAN (exempt status) cohorts, sensitivity analyses were performed: (1) by excluding these cases completely; and (2) by imputing their outcomes (Fig A in S4 File). Targeted maximum likelihood estimation (TMLE) modelling (a machine learning ensemble that is less sensitive to violations of positivity and does not require data distribution assumptions) was undertaken for comparison [40]. Additionally, calculation of E-values for our 2 primary outcomes was performed to quantify the magnitude of unobserved bias required to alter our findings.

Ethics/Governance
Ethical approval for the project was obtained from Mercy, Monash Health and Melbourne IVF Health Human Research Ethics Committees. Each data custodian provided contractual approval for data access and data linkage. The CVDL approved the project and performed the linkage.

Results
The total cohort included 585,659 singleton births in Victoria between 2005 and 2014. Among this cohort, 173,200 children, including 4,697 IVF births, were linked to AEDC outcome data. Additionally, 342,331 children, including 8,976 IVF births, were linked to NAPLAN data (Fig 1). Overall, a total of 11,059 IVF-conceived children and 401,654 spontaneously conceived children

PLOS MEDICINE
School-age childhood outcomes following IVF conception were included in the study (2,614 IVF cases and 100,184 controls were in both study arms). We estimate that our study cohort includes >95% of IVF conceptions during the study timeframe (Tables A and B in S3 File). Analysis of the linked and non-linked cases showed little evidence of association between linkage and exposure status (Chi 2 p = 0.80); that is, IVF cases were just as likely to be included in the final linked cohort as controls. There were no births from 2014 that linked to outcome data.
Baseline population characteristics differed considerably between the 2 exposure groups ( Table 2). Compared with spontaneously conceived controls, children conceived via IVF had older, more highly educated parents and mothers with lower parity. IVF-conceived children resided in postal areas with higher socioeconomic ranking and were less likely to be from non-English speaking backgrounds. Age at assessment was similar between the exposure groups.

Primary outcome
Our findings suggest no causal effect of IVF conception on developmental vulnerability, with 13.6% of IVF-conceived children predicted to be developmentally vulnerable (<10th percentile in 2 or more domains of the 5 AEDC domains) compared with 13.9% of spontaneously conceived children. The adjusted RD was at −0.3%, indicating that 0.3% fewer children who were conceived by IVF were developmentally vulnerable compared with those conceived spontaneously. However, the 95% CI (−3.7% to 3.1%), indicates this result is indistinguishable from zero. Similarly, the adjusted relative risk showed no detectable difference in risk of developmental vulnerability, where IVF-conceived children were 3.0% less likely to be developmentally vulnerable than spontaneously conceived children (RR 0.97, 95% CI: 0.77 to 1.25) ( Table 3).

Secondary outcomes
For secondary outcomes, we examined each of the 5 AEDC domains individually. The unadjusted observed results and causal model results for each individual domain are reported in Table 3. There were no differences between IVF-and spontaneously conceived children in adjusted risk difference for any of the individual AEDC domains.

Missing data
Outcome data were missing for 5.6% of the AEDC-linked cohort. The vast majority (92%) of these missing cases were children with special needs (5.2% of overall cohort). There was no evidence of an association between the presence of missing outcome and exposure status (Chi 2 p = 0.68). Sensitivity analysis was performed by (1) excluding children with special needs; and (2) including these children, with multiple imputation of their missing outcomes (Tables A and B in S8 File). Most covariates had minimal or no missing data (<1.0%). Maternal education level was missing for 30.5% and maternal post-school education was missing for 31.6%.

Primary outcome
Our findings indicate the causal effect of IVF conception on overall NAPLAN z-score was indistinguishable from zero. The predicted outcome mean z-score and was 0.013 (SE 0.024) for IVF-conceived children and −0.016 (SE 0.002) for spontaneously conceived controls, with an adjusted mean difference of 0.030 (95% CI −0.018 to 0.077) ( Table 4).

Secondary outcomes
For secondary outcomes, we examined individual NAPLAN domain z-scores (Table 4). IVFconceived children performed better on average in measures of writing than their spontaneously conceived peers with a z-score mean difference of 0.068 (95% CI 0.004 to 0.132), but this is unlikely to be a clinically important difference. The estimated effect is less than 0.07 of a standard deviation, and a difference of 0.2 standard deviations or greater was determined a priori as representing a finding of importance.
Additionally, for each domain, a binary outcome (domain scores above or below the national minimum standard) was examined. In 4 of 5 domains (numeracy, reading, spelling, and writing), IVF-conceived children were less likely to be below the national minimum standard compared with their spontaneously conceived peers (Table 5). For these 4 domains, the RD was between −0.7% and −1.25%. In absolute terms, this equates to approximately 1 additional IVF-conceived child, for every 100, predicted to score above the national minimum standard compared with their spontaneously conceived peers.

Missing data
Spontaneously conceived children were more likely to have missing NAPLAN data (7.6%) than IVF-conceived children (5.9%, Chi 2 p < 0.001). During the primary analysis, missing Table 3. Results of final causal model.

Non-imputed crude data
Imputed data-causal model a Children with missing outcome data identified as having special needs (5.2%) are included-their outcome category assumed to be "developmentally vulnerable." a Causal model: multiply imputed data (imputation of covariates and outcomes) pooled estimates-doubly robust method: regression adjustment model with stabilised inverse probability weighting, plus trimmed for complete weight overlap. Variables included in model: sex at birth, age at assessment, language background other than English, socioeconomic status, maternal age, parity, and education. b Small variation in case number for each of the 20 imputation datasets.
https://doi.org/10.1371/journal.pmed.1004148.t003 outcomes related to a child being absent or withdrawing from the test were imputed. The results presented include 7,222 children who were exempt from sitting the NAPLAN, with their results set to the lowest possible outcome score. Sensitivity analysis was performed by (1) excluding these children; and (2) including the exempt cases, with multiple imputation of their missing outcomes. There was no meaningful difference in the results (Tables A and B in S9  File). Most covariates had minimal or no missing data (<4.0%). Second parent school education level was missing in 13.8% of cases and post-school education missing in 15.4% of cases.

Sensitivity analyses
To validate our analysis model, we re-examined our AEDC primary outcome and the NAPLAN binary domain outcomes using TMLE modelling. Results from the TMLE model did not meaningfully differ from the findings of the primary analysis (Table A in S10 File). An E-value was estimated for both primary outcomes and was found to be 1.90 and 1.77 for AEDC and NAPLAN outcomes, respectively, suggesting that an unknown bias of sufficient magnitude to change the study findings is unlikely (Figs A and B in S11 File).

Discussion
Using a causal inference approach, we found no effect of IVF conception on developmental vulnerability at school entry in Victorian children born between 2005 and 2014. Additionally, IVF-conceived children performed as well as their spontaneously conceived peers in schoolbased psychometric testing at age 7 to 9 years.
For the first time, our study has estimated the causal effect of IVF conception on global childhood development at school entry and educational outcomes at primary school, under the assumptions of causal inference. Using an updated epidemiological approach [25], this   study provides robust evidence about the longer term implications of IVF conception. The findings of this study offer timely reassurance about the impact of IVF conception on the developmental and educational outcomes at primary school age of the children conceived. Neither the outcomes of developmental vulnerability at school entry nor educational achievement at age 7 to 9 differed substantially between IVF-and spontaneously conceived children. Among 4 out of 5 NAPLAN individual domain national minimum standard results, there was a trend towards better performance in the IVF cases, but the clinical and social implications of these findings are difficult to quantify. Two large Scandinavian studies have reported on childhood outcomes following IVF conception. Norrman and colleagues found that IVF-conceived children perform worse on school-based assessment in year 9 [18], among their cohort of just over 8,000 IVF-conceived children. Wienecke and colleagues reported that IVF-conceived children had poorer school performance than controls and that spontaneously conceived children of subfertile parents also had poorer outcomes [19]. By examining a subpopulation of spontaneously conceived children of subfertile parents, the authors of this Danish study concluded that the IVF process itself was not responsible for the differences demonstrated [19].
These past studies are limited by examining historical birth cohorts dating back prior to the year 2001. Our study examines a more contemporary birth cohort (2005 to 2014), which is important given the advances in artificial reproductive techniques that have occurred since the turn of the century. IVF technologies that have evolved since this time include the introduction of blastocyst culture, vitrification, and single-embryo transfer [4,43]. Thus, our study findings are more generalisable to contemporary fertility practice. Importantly, our use of updated epidemiological and statistical methods ensures that we have estimated effects that have a causal interpretation. It is important that our methods are replicated in future studies to strengthen the existing evidence base. Given the use of observational data, there were missing data and inherent differences in the covariate profile of the exposure cohorts. An a priori SAP was developed to overcome these limitations. First, inverse probability weighting with regression adjustment was used to mimic exchangeable treatment and control comparison groups, similar to those that would be generated by randomisation in a controlled trial. The success of this procedure is demonstrated by achieving adequate covariate balance and thus sufficient overlap of covariate distributions between exposure groups after inverse probability weighting (Figs A-D in S6 File). Second, we sought to mitigate the potential biases resulting from missing data. In order to do this, we performed multiple imputation of covariates included in our model and then compared the results of analyses that were based on complete cases with those of multiply imputed datasets (Tables A and B in S12 File).
It is possible that unmeasured common cause confounders may have led to bias in estimating the ATEs. Many important factors (socioeconomic status, maternal age, and education) were identified a priori, measured, and included in the estimation procedure. Potential known but unmeasured sources of bias include subfertility and maternal BMI. Current evidence suggests that subfertility is likely to be associated with poorer childhood outcomes [19]. Consequently, if this variable were able to be measured and included in our causal model, correcting for it is likely to have favoured IVF-conceived children in our analysis. Maternal BMI is also likely to have followed the same trend with higher average BMI among the IVF group (after accounting for socioeconomic position) and high BMI being associated with poorer perinatal and childhood outcomes [44]. Unmeasured variables may have had an impact on the outcome. Factors such as childcare attendance or grandparent involvement will be preceded on causal pathways by covariates that were measured and included in the model, such as maternal age and socioeconomic status [45][46][47]. These factors were therefore considered to mediate rather than confound the relationship between these covariates and the outcome. Sensitivity analyses were performed to further evaluate unmeasured confounding, with E-values calculated for AEDC and NAPLAN primary outcomes. Within the limitation of E-values, these analyses indicate that it is unlikely an unknown bias exists without our knowledge and with the necessary magnitude of effect and prevalence to change our conclusions (Figs A and B in S11 File) [48].
Generalisation of our findings to all IVF births is a potential study limitation. As described in our Methods, observations with non-overlapping PSs were excluded from analysis in order to meet the assumption of positivity, required for causal inference under the potential outcomes framework. Generalisation of our findings to all IVF births therefore requires the consideration that the baseline characteristics of the population of interest are comparable to the IVF cases analysed in our final cohort.
Due to the use of school-based outcome assessments, our cohort was limited to children attending school. AEDC, as a triennial assessment, limited our sample to children captured during assessment years, and the later years of our birth cohort had not yet reached the assessment age for NAPLAN outcomes to be captured. However, our study included 70% of the relevant birth cohort for the study timeframe and in the years where both AEDC and NAPLAN data were available, over 95% of the Victorian birth cohort was sampled (Table A in S3 File). The remaining approximately 5% of children not sampled represent failed linkages as well as excluded IVF conceptions (due to non-Victorian IVF or non-IVF-assisted reproduction). A small percentage will also represent children with a disability significant enough not to attend mainstream school, introducing potential selection bias. Importantly, however, our study was not designed to assess severe disability or developmental delay, but rather an overall measure of global development and school achievement.
Furthermore, through the examination of school-based outcomes, our study was inherently designed to examine outcomes for liveborn children. "Live birth bias" as it is known, is a recognised limitation of observational studies that investigate periconception and antenatal exposures [49]. For the purposes of this research question, the outcomes of failed conception, miscarriage, stillbirth are considered alternative endpoints and less relevant to the research question that aims to compare the school-age outcomes of children born following IVF conception with those who were conceived without assistance.
Under the specified assumptions, this analysis has demonstrated that there is no causal effect within the population studied of IVF conception on early childhood developmental vulnerability and school-age educational outcomes. Compared with spontaneously conceived children, children conceived by IVF were no more likely to be developmentally vulnerable at school entry and had equivalent numeracy and literacy performance by age 7 to 9 years. These findings provide important reassurance for current and prospective parents and their treating clinicians.  Tables A and B. Table A. Sensitivity analysis-AEDC (special needs multiply imputed).